AITopics | content quality

Collaborating Authors

content quality

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Quantifying Label-Induced Bias in Large Language Model Self- and Cross-Evaluations

Saraf, Muskan, Boroujeni, Sajjad Rezvani, Beaudry, Justin, Abedi, Hossein, Bush, Tom

arXiv.org Artificial IntelligenceOct-13-2025

Large language models (LLMs) are increasingly deployed as evaluators of text quality, yet the validity of their judgments remains underexplored. This study investigates systematic bias in self- and cross-model evaluations across three prominent LLMs: ChatGPT, Gemini, and Claude. We designed a controlled experiment in which blog posts authored by each model were evaluated by all three models under four labeling conditions: no attribution, true attribution, and two false-attribution scenarios. Evaluations employed both holistic preference voting and granular quality ratings across three dimensions Coherence, Informativeness, and Conciseness with all scores normalized to percentages for direct comparison. Our findings reveal pronounced asymmetries in model judgments: the "Claude" label consistently elevated scores regardless of actual authorship, while the "Gemini" label systematically depressed them. False attribution frequently reversed preference rankings, producing shifts of up to 50 percentage points in voting outcomes and up to 12 percentage points in quality ratings. Notably, Gemini exhibited severe self-deprecation under true labels, while Claude demonstrated intensified self-preference. These results demonstrate that perceived model identity can substantially distort both high-level judgments and fine-grained quality assessments, independent of content quality. Our findings challenge the reliability of LLM-as-judge paradigms and underscore the critical need for blind evaluation protocols and diverse multi-model validation frameworks to ensure fairness and validity in automated text evaluation and LLM benchmarking.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.21164

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Industry: Government > Voting & Elections (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Retentive Relevance: Capturing Long-Term User Value in Recommendation Systems

Bakhshi, Saeideh, Nguyen, Phuong Mai, Schiller, Robert, Xu, Tiantian, Kodandapani, Pawan, Levine, Andrew, Simpson, Cayman, Wang, Qifan

arXiv.org Artificial IntelligenceOct-10-2025

Recommendation systems have traditionally relied on short-term engagement signals, such as clicks and likes, to personalize content. However, these signals are often noisy, sparse, and insufficient for capturing long-term user satisfaction and retention. We introduce Retentive Relevance, a novel content-level survey-based feedback measure that directly assesses users' intent to return to the platform for similar content. Unlike other survey measures that focus on immediate satisfaction, Retentive Relevance targets forward-looking behavioral intentions, capturing longer term user intentions and providing a stronger predictor of retention. We validate Retentive Relevance using psychometric methods, establishing its convergent, discriminant, and behavioral validity. Through large-scale offline modeling, we show that Retentive Relevance significantly outperforms both engagement signals and other survey measures in predicting next-day retention, especially for users with limited historical engagement. We develop a production-ready proxy model that integrates Retentive Relevance into the final stage of a multi-stage ranking system on a social media platform. Calibrated score adjustments based on this model yield substantial improvements in engagement, and retention, while reducing exposure to low-quality content, as demonstrated by large-scale A/B experiments. This work provides the first empirically validated framework linking content-level user perceptions to retention outcomes in production systems. We offer a scalable, user-centered solution that advances both platform growth and user experience. Our work has broad implications for responsible AI development.

artificial intelligence, proceedings, retentive relevance, (13 more...)

arXiv.org Artificial Intelligence

2510.07621

Country:

North America > United States > California (0.47)
North America > United States > New York > New York County > New York City (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Add feedback

QoNext: Towards Next-generation QoE for Foundation Models

Guo, Yijin, Zhang, Zicheng, Shen, Ye, Wen, Farong, Wang, Junying, Jia, Qi, Zhai, Guangtao

arXiv.org Artificial IntelligenceOct-10-2025

Existing evaluations of foundation models, including recent human-centric approaches, fail to capture what truly matters: user's experience during interaction. Current methods treat evaluation as a matter of output correctness alone, overlooking that user satisfaction emerges from the interplay between response quality and interaction, which limits their ability to account for the mechanisms underlying user experience. To address this gap, we introduce QoNext, the first framework that adapts Quality of Experience (QoE) principles from networking and multimedia to the assessment of foundation models. QoNext identifies experiential factors that shape user experience and incorporates them into controlled experiments, where human ratings are collected under varied configurations. From these studies we construct a QoE-oriented database and train predictive models that estimate perceived user experience from measurable system parameters. Our results demonstrate that QoNext not only enables proactive and fine-grained evaluation but also provides actionable guidance for productized services of optimizing foundation models in practice.

dimension, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.21889

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.93)
Leisure & Entertainment (0.92)
Education (0.67)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(5 more...)

Add feedback

Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens

Sugiura, Issa, Kurita, Shuhei, Oda, Yusuke, Higashinaka, Ryuichiro

arXiv.org Artificial IntelligenceSep-19-2025

We propose Llama-Mimi, a speech language model that uses a unified tokenizer and a single Transformer decoder to jointly model sequences of interleaved semantic and acoustic tokens. Comprehensive evaluation shows that Llama-Mimi achieves state-of-the-art performance in acoustic consistency and possesses the ability to preserve speaker identity. Our analysis further demonstrates that increasing the number of quantizers improves acoustic fidelity but degrades linguistic performance, highlighting the inherent challenge of maintaining long-term coherence. We additionally introduce an LLM-as-a-Judge-based evaluation to assess the spoken content quality of generated outputs. Our models, code, and speech samples are publicly available.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.14882

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Assisting Research Proposal Writing with Large Language Models: Evaluation and Refinement

Ren, Jing, Wang, Weiqi

arXiv.org Artificial IntelligenceSep-15-2025

In this study, we employ ChatGPT -4o to generate academically sound, high-quality research proposals. T o evaluate the writing capabilities and potential of LLMs, we adopt both standard GPT -only and GPT -assisted writing approaches. T o effectively assess the writing capabilities of LLMs, we introduce two key evaluation metrics: content quality and reference validity . Additionally, we implement an iterative prompting method aimed at enhancing content quality and reducing inaccuracies and fabrications in references generated by LLMs. Our results show that the dual-metrics evaluation rigorously quantifies ChatGPT's writing capabilities, while iterative prompting enhances content quality, reduces errors, and addresses ethical concerns in reference generation. This proposal writing, evaluation, and improvement framework offers users a practical way to generate high-quality research proposals tailored to their needs. Future research can build upon this work by developing more efficient writing strategies and advanced methods to further enhance the writing capabilities of LLMs.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2509.09709

Country: Oceania > Australia > New South Wales > Sydney (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Assessment & Standards (0.69)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Entry Barriers in Content Markets

Zhu, Haiqing, Xie, Lexing, Cheung, Yun Kuen

arXiv.org Artificial IntelligenceSep-3-2025

The prevalence of low-quality content on online platforms is often attributed to the absence of meaningful entry requirements. This motivates us to investigate whether implicit or explicit entry barriers, alongside appropriate reward mechanisms, can enhance content quality. We present the first game-theoretic analysis of two distinct types of entry barriers in online content platforms. The first, a structural barrier, emerges from the collective behaviour of incumbent content providers which disadvantages new entrants. We show that both rank-order and proportional-share reward mechanisms induce such a structural barrier at Nash equilibrium. The second, a strategic barrier, involves the platform proactively imposing entry fees to discourage participation from low-quality contributors. We consider a scheme in which the platform redirects some or all of the entry fees into the reward pool. We formally demonstrate that this approach can improve overall content quality. Our findings establish a theoretical foundation for designing reward mechanisms coupled with entry fees to promote higher-quality content and support healthier online ecosystems.

artificial intelligence, machine learning, mechanism, (19 more...)

arXiv.org Artificial Intelligence

2509.01953

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing

Liao, Jianxing, Zhang, Tian, Feng, Xiao, Zhang, Yusong, Yang, Rui, Wang, Haorui, Wen, Bosi, Wang, Ziying, Shi, Runzhi

arXiv.org Artificial IntelligenceAug-29-2025

Large language models are extensively utilized in creative writing applications. Creative writing requires a balance between subjective writing quality (e.g., literariness and emotional expression) and objective constraint following (e.g., format requirements and word limits). Existing methods find it difficult to balance these two aspects: single reward strategies fail to improve both abilities simultaneously, while fixed-weight mixed-reward methods lack the ability to adapt to different writing scenarios. To address this problem, we propose Reinforcement Learning with Mixed Rewards (RLMR), utilizing a dynamically mixed reward system from a writing reward model evaluating subjective writing quality and a constraint verification model assessing objective constraint following. The constraint following reward weight is adjusted dynamically according to the writing quality within sampled groups, ensuring that samples violating constraints get negative advantage in GRPO and thus penalized during training, which is the key innovation of this proposed method. We conduct automated and manual evaluations across diverse model families from 8B to 72B parameters. Additionally, we construct a real-world writing benchmark named WriteEval for comprehensive evaluation. Results illustrate that our method achieves consistent improvements in both instruction following (IFEval from 83.36% to 86.65%) and writing quality (72.75% win rate in manual expert pairwise evaluations on WriteEval). To the best of our knowledge, RLMR is the first work to combine subjective preferences with objective verification in online RL training, providing an effective solution for multi-dimensional creative writing optimization.

constraint, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.18642

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

SGSimEval: A Comprehensive Multifaceted and Similarity-Enhanced Benchmark for Automatic Survey Generation Systems

Guo, Beichen, Wen, Zhiyuan, Yang, Yu, Gao, Peng, Yang, Ruosong, Shen, Jiaxing

arXiv.org Artificial IntelligenceAug-18-2025

The growing interest in automatic survey generation (ASG), a task that traditionally required considerable time and effort, has been spurred by recent advances in large language models (LLMs). With advancements in retrieval-augmented generation (RAG) and the rising popularity of multi-agent systems (MASs), synthesizing academic surveys using LLMs has become a viable approach, thereby elevating the need for robust evaluation methods in this domain. However, existing evaluation methods suffer from several limitations, including biased metrics, a lack of human preference, and an over-reliance on LLMs-as-judges. To address these challenges, we propose SGSimEval, a comprehensive benchmark for Survey Generation with Similarity-Enhanced Evaluation that evaluates automatic survey generation systems by integrating assessments of the outline, content, and references, and also combines LLM-based scoring with quantitative metrics to provide a multifaceted evaluation framework. In SGSimEval, we also introduce human preference metrics that emphasize both inherent quality and similarity to humans. Extensive experiments reveal that current ASG systems demonstrate human-comparable superiority in outline generation, while showing significant room for improvement in content and reference generation, and our evaluation metrics maintain strong consistency with human assessments.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.1131

Country:

Asia (0.95)
North America > United States (0.28)
Europe > Austria (0.28)

Genre:

Research Report (1.00)
Overview (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Mind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AI

Paraschou, Eva, Arapakis, Ioannis, Yfantidou, Sofia, Macaluso, Sebastian, Vakali, Athena

arXiv.org Artificial IntelligenceJun-17-2025

Artificial Intelligence (AI) is rapidly embedded in critical decision-making systems, however their foundational ``black-box'' models require eXplainable AI (XAI) solutions to enhance transparency, which are mostly oriented to experts, making no sense to non-experts. Alarming evidence about AI's unprecedented human values risks brings forward the imperative need for transparent human-centered XAI solutions. In this work, we introduce a domain-, model-, explanation-agnostic, generalizable and reproducible framework that ensures both transparency and human-centered explanations tailored to the needs of both experts and non-experts. The framework leverages Large Language Models (LLMs) and employs in-context learning to convey domain- and explainability-relevant contextual knowledge into LLMs. Through its structured prompt and system setting, our framework encapsulates in one response explanations understandable by non-experts and technical information to experts, all grounded in domain and explainability principles. To demonstrate the effectiveness of our framework, we establish a ground-truth contextual ``thesaurus'' through a rigorous benchmarking with over 40 data, model, and XAI combinations for an explainable clustering analysis of a well-being scenario. Through a comprehensive quality and human-friendliness evaluation of our framework's explanations, we prove high content quality through strong correlations with ground-truth explanations (Spearman rank correlation=0.92) and improved interpretability and human-friendliness to non-experts through a user study (N=56). Our overall evaluation confirms trust in LLMs as HCXAI enablers, as our framework bridges the above Gaps by delivering (i) high-quality technical explanations aligned with foundational XAI methods and (ii) clear, efficient, and interpretable human-centered explanations for non-experts.

explanation, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2506.1224

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(2 more...)

Add feedback

InteractiveSurvey: An LLM-based Personalized and Interactive Survey Paper Generation System

Wen, Zhiyuan, Cao, Jiannong, Wang, Zian, Guo, Beichen, Yang, Ruosong, Liu, Shuaiqi

arXiv.org Artificial IntelligenceApr-15-2025

The exponential growth of academic literature creates urgent demands for comprehensive survey papers, yet manual writing remains time-consuming and labor-intensive. Recent advances in large language models (LLMs) and retrieval-augmented generation (RAG) facilitate studies in synthesizing survey papers from multiple references, but most existing works restrict users to title-only inputs and fixed outputs, neglecting the personalized process of survey paper writing. In this paper, we introduce InteractiveSurvey - an LLM-based personalized and interactive survey paper generation system. InteractiveSurvey can generate structured, multi-modal survey papers with reference categorizations from multiple reference papers through both online retrieval and user uploads. More importantly, users can customize and refine intermediate components continuously during generation, including reference categorization, outline, and survey content through an intuitive interface. Evaluations of content quality, time efficiency, and user studies show that InteractiveSurvey is an easy-to-use survey generation system that outperforms most LLMs and existing methods in output content quality while remaining highly time-efficient.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.08762

Country: Asia > China (0.46)

Genre:

Questionnaire & Opinion Survey (1.00)
Overview (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback